In this lesson, we explore how to handle “discovered, but currently not indexed” warnings in Google Search Console. Learn which warnings can be ignored and which require your immediate attention. Follow our guide on how to analyze affected URLs and fully understand how the Googlebot works and thinks.
Have a Question?
To ask a practical question:
To be able to help you with obstacles which occur while you are working for an employer or a client, we need a lot more input. That data is sensitive, therefor SEONAUTs have the option to ask the mentor 1on1 questions in private and to provide more details.
To unlock even more features:
To ask a theoretical question:
We love the SEOLAXY community an we provided free answers on YouTube for many years. Today it is physically not possible to answer all of them. But we are still commited to answer all theoretical questions and questions about the lesson in YouTube comments.
If you have a theoretical question about this lesson:
Lesson Transcript
Good or Bad Warning?
Well, first good news: in many cases you don't need to fix the issue at all. Not everything you see in the Google Search Console is a bad error. Sometimes it is a warning which can be interpreted as a suggestion to take a look at something. Now, the bad news: in some cases this warning should be taken very seriously depending on the page type you get this warning for.
What to Ignore and What to Take Seriously?
So which “discovered but currently not indexed” warnings can be ignored and which should be taken seriously. First, click on the warning to get a list of URLs which are affected. You can also download that list and analyze it for example in Google Sheets. Secondly, group the URLs by page type or just focus for a minute or two and see which page type predominates on that list. Usually you will find a lot of category paging URLs which have wrongly set to index. This is also the reason why the Googlebot discovered them and didn't index, they are not worth indexing and you shouldn't want them to be indexed, because they don't give much value to users and users would find it odd to land on page number six for example when coming from Google. If the majority of the URLs aren’t paging URLs or sorting URLs, then you should dig deeper, especially if you find category or product URLs there. But the reason why they are not indexed is most often that those pages are considered thin content, meaning either Google can’t find out what they're about because they don't have much content or in case of categories they lack products.
Overview of How the Googlebot Works
But if you want to fully understand why Google is indexing something or not and why some URLs should not be indexed, you need to understand how the Googlebot works and thinks. In the last lesson you have been given an overview of the theoretical part. If you haven't watched it, you should do it right now and come back again to this lesson afterwards. Now we are going to take a practical approach and see how you can implement it. Let's create an overview how the Googlebot works. We will show the path using a flowchart and must simplify it a bit for a better understanding. In the SEOLAXY ACADEMY lessons we are going to dig deeper, but for right now this is more than enough. It all starts with the Googlebot requesting to open the domain name, for example myonlinestore.com That will trigger the DNS error for example ns1.myhosting.com which is a matching system, which tries to find the IP address of a domain name. If there is more than one domain associated with that IP address, that takes a few milliseconds longer to resolve. After it has found the main folder of that website it will initiate opening it. But first the server looks if an SSL certificate is assigned to that domain. If no SSL is assigned, it will try to open the HTTP version of the domain and if there is one, it will try to open the HTTPS version. Every server has a default value for a domain, either it is the non www version or the www version. In both cases the .htaccess file or another configuration file will redirect the Googlebot to the version the website owner wants to, so there can be a redirection from here to here or vice versa. Now the Googlebot wants to open the robots.txt file which should be always located in the root folder. That is the place where we suggest the Googlebot to enter the website or not.
Next Episode Preview
If you got value out of this lesson, please consider subscribing to our youtube channel. In the next lesson, we will take a look at the .htaccess and robots.txt files and how to set up domain redirections and allow or disallow the Googlebot to enter the website or particular parts of our online store. See you next time!